Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Docker] Support docker login on RunPod. #4287

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

cblmemo
Copy link
Collaborator

@cblmemo cblmemo commented Nov 7, 2024

Fixes #4269.

TO DISCUSS:

Currently RunPod Python API does not provide enough API to delete the template, so we manually call the graphql api instead. I've requested those features in the RunPod python sdk repo: runpod/runpod-python#390

Sometimes the deletion of registry auth will fail without a clear error message. Cleanup of template is always successful though. I also submitted this: runpod/runpod-python#391

$ sky down rp-docc-3
Terminating 1 cluster: rp-docc-3. Proceed? [Y/n]: 
Terminating 1 cluster ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--{'errors': [{'message': 'Something went wrong. Please try again later or contact support.', 'path': ['deleteRegistryAuth'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}], 'data': {'deleteRegistryAuth': None}}
Failed to delete registry auth cm56jeayk0001l208aehw3oe9: Something went wrong. Please try again later or contact support.Please delete it manually.
Terminating cluster rp-docc-3...done.
Terminating 1 cluster ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
# @temp/rp_docker.yaml
resources:
  cloud: runpod
  image_id: docker:txia-test-ecr:latest

envs:
  SKYPILOT_DOCKER_USERNAME: AWS
  SKYPILOT_DOCKER_PASSWORD:
  SKYPILOT_DOCKER_SERVER: 679991763071.dkr.ecr.us-east-1.amazonaws.com

run: which nginx
$ SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-east-1) sky launch @temp/rp_docker.yaml -c rp-doc --env SKYPILOT_DOCKER_PASSWORD 'which nginx'
⚙︎ Launching on RunPod NL.
└── Instance is up.
✓ Cluster launched: rp-doc.  View logs at: ~/sky_logs/sky-2024-12-27-00-10-55-296823/provision.log
⚙︎ Mounting files.
⚙︎ Job submitted, ID: 1
├── Waiting for task resources on 1 node.
└── Job started. Streaming logs... (Ctrl-C to exit log streaming; job will not be killed)
(task, pid=2396) /usr/sbin/nginx
✓ Job finished (status: SUCCEEDED).
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@cblmemo
Copy link
Collaborator Author

cblmemo commented Nov 8, 2024

Maybe they are including the server name in the image name.

@cblmemo cblmemo marked this pull request as ready for review December 27, 2024 09:29
@cblmemo
Copy link
Collaborator Author

cblmemo commented Dec 27, 2024

@Michaelvll This is ready for a review!

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cblmemo! Left some comment for the new API added for provision_record

sky/provision/common.py Outdated Show resolved Hide resolved
sky/provision/runpod/instance.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
@cblmemo
Copy link
Collaborator Author

cblmemo commented Jan 7, 2025

@Michaelvll After a thorough search, I found another doc page for runpod and managed to implement get resoruces from cluster name. The ephemeral_resources field is deleted and now we won't overwrite the cluster yaml during provisioning. This is ready for another look!

https://graphql-spec.runpod.io/#definition-PodTemplate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runpod docker credentials not working when using image_id from private repository
2 participants